Mar 2nd, 2017

Xyinkl 3月 25, 2017

Arteficial Intelligence 人工智能

Neighbourhood Watch 街区观察

@(TheEconomist)[英语, 翻译, 经济学人]

enter image description here

Millions of images of public streets offer a cheap, sweeping view of America’s demography
数以百万计的街景照片提供了低成本、全覆盖的美国人口调查视角

“WOULD it not be of great satisfaction to the king to know, at a designated moment every year, the number of his subjects?” A military engineer by the name of Sébastien le Prestre de Vauban posed this question to Louis XIV in 1686, pitching him the idea of a census. All France’s resources, the wealth and poverty of its towns and the disposition of its nobles would be counted, so that the king could control them better.

1686年，一位名叫Sébastien le Prestre de Vauban的军事工程师向Louis XIV世建议：国王在一年中的某个特定的时间节点，能够知悉他所统治的王国的情况，这多么让人感到愉悦满足。这也成了人口普查的来源。所有的法国资源，无论是城镇中富有或贫穷的人口，抑或是贵族的思想状况都会被统计在内，以便国王更好统治国家。

These days, such surveys are common. But they involve a lot of shoe-leather, and that makes them expensive. America, for instance, spends hundreds of millions of dollars every year on a socioeconomic investigation called the American Community Survey; the results can take half a decade to become available. Now, though, a team of researchers, led by Timnit Gebru of Stanford University in California, have come up with a cheaper, quicker method. Using powerful computers, machine-learning algorithms and mountains of data collected by Google, the team carried out a crude, probabilistic census of America’s cities in just two weeks.

在现今社会，这种调查已经比较常见。但是依然颇为浪费人力，而这同样耗资巨大。举例来说，美国每年在一项叫做“美国社区调查”的社会经济调查上花费数亿美元。而统计的结果则需要超过半个世纪才能被使用。不过，现在来自加利福尼亚的斯坦福大学的Timnit Gebru研究团队已经探索出一种低廉、快速的统计方法。通过对强劲计算机、机器学习算法以及谷歌收集的海量数据，研究团队已经能够在两周内得出一种数据原始的基于概率的美国城市普查。
【involve a lot of shoe-leather】n. Leather from which shoes are made that is worn out through walking.

First, the researchers trained their machine-learning model to recognise the make, model and year of many different types of cars. To do that they used a labelled data set, downloaded from automotive websites like Edmunds and Cars.com. Once the algorithm had learned to identify cars, it was turned loose on 50m images from 200 cities around America, all collected by Google’s Streetview vehicles, which provide imagery for the firm’s mapping applications. Streetview has photographed most of the public streets in America, and in among them the researchers spotted 22m different cars—around 8% of the number on America’s roads.

首先，研究团队通过从Edmunds和Cars.com等汽车网站上下载的数据来训练他们的机器学习模型辨识汽车的品牌、型号和年份。一旦识别算法能够成功辨识汽车后，谷歌街景汽车为谷歌地图应用收集的来自200多个美国城市的超过5000万张图片会被输入其中。谷歌街景已经把美国大部分公共街道拍摄完全，在这些街道上，研究团队识别出了2200万辆不同的汽车。这一数字约为美国汽车保有量的8%。
【recognise the make, model and year of】the make = the brand

The computer classified those cars into one of 2,657 categories it had learned from studying the Edmunds and Cars.com data. The researchers then took data from the traditional census, and split them in half. One half was fed to the machine-learning algorithm, so it could hunt for correlations between the cars it saw on the roads in those neighbourhoods and such things as income levels, race and voting intentions. Once that was done, the algorithm was tested on the other half of the census data, to see if these correlations held true for neighbourhoods it had never seen before. They did. The sorts of cars you see in an area, in other words, turn out to be a reliable proxy for all sorts of other things, from education levels to political leanings. Seeing more sedans than pickup trucks, for instance, strongly suggests that a neighbourhood tends to vote for the Democrats.

计算机接下来会把这些车按照从Edmunds和Cars.com下载的数据分类到2657个汽车子类之一。研究团队接着把以往传统调研的数据一分为二。其中的半份被输入机器学习算法，以此来找寻路面上的车辆型号和社区收入水平、种族以及政治倾向等的关联。另外一半之前没有被机器辨识，则被用来验证前一半得出的结论。研究结果是，某个地区停靠的车辆类型，确实可以成为从教育水平到政治偏向等很多事物上的有效参考指标。如果一个街区的轿车数超过货运卡车数，那么这个街区有更大的可能性会支持民主党。

The system has limitations: unlike a census, it generates predictions, not facts, and the more fine-grained those predictions are the less certain they become. The researchers reckon their system is accurate to the level of a precinct, an American political division that contains about 1,000 people. And because those predictions rely on the specific, accurate data generated by traditional surveys, it seems unlikely ever to replace them.

这一系统还有其局限。不像传统调研，它不能提供事实，仅仅能提供预测。并且要预测的事件越详细具体，那么预测的准确性就会越低。研究人员希望这一系统能够在容纳1000选民的某个选区范围内准确有效。并且由于这些预计基于传统调研具体而又准确的数据，这让其替代后者成为不可能。
【fine-grained】精细的

On the other hand, it is much cheaper and much faster. Dr Gebru’s system ran on a couple of hundred processors, a modest amount of hardware by the standards of artificial-intelligence research. It nevertheless managed to crunch through its 50m images in two weeks. A human, even one who could classify all the cars in an image in just ten seconds, would take 15 years to do the same.

从另一个方面来讲，这一系统极大提升了速度降低了成本。Gebru博士的系统运行在几百个处理器上，以人工智能研究的标准看，这一数量只能算中等规模。然而它却能在两周内处理完成5000万图像辨识。如果同样工作即便人工以每十秒一张图片的速度，也需要15年才能完成。
【crunch】v. 发出碎裂声；嘎吱嘎吱地咀嚼；嘎喳嘎喳地碾过

The other advantage of the AI approach is that it can be re-run whenever new data become available. As Dr Gebru points out, Streetview is not the only source of information out there. Self-driving cars, assuming they catch on, will use cameras, radar and the like to keep track of their surroundings. They should, therefore, produce even bigger data sets. (Vehicles made by Tesla, an electric-car firm, are capturing such information even now.) Other kinds of data, such as those from Earth-imaging satellites, which Google also uses to refresh its maps, could be fed into the models, too. De Vauban’s “designated moment” could soon become a constantly updated one.

这一人工智能尝试的另一个优点则是，一旦有新的数据可以被使用，它可以被重新调用运行。正像Gebru博士指出的，谷歌街景不是唯一的信息来源。自动驾驶汽车一旦逐渐流行，其使用照相机、雷达或类似方式保存周围环境情况的工作模式将会产生更大的数据集合。（Tesla制造的电动汽车现在已经在采集相关信息）其他类型数据，比如谷歌用来刷新其地图的卫星图像同样能被用来训练模型。De Vauban的“某一时刻”很快就将成为“每时每刻”
【catch on】Become popular